Optimal Lempel-Ziv based lossy compression for memoryless data: how to make the right mistakes
نویسندگان
چکیده
Compression refers to encoding data using bits, so that the representation uses as few bits as possible. Compression could be lossless: i.e. encoded data can be recovered exactly from its representation) or lossy where the data is compressed more than the lossless case, but can still be recovered to within prespecified distortion metric. In this paper, we prove the optimality of Codelet Parsing, a quasi-linear time algorithm for lossy compression of sequences of bits that are independently and identically distributed (iid) and Hamming distortion. Codelet Parsing extends the lossless Lempel Ziv algorithm to the lossy case—a task that has been a focus of the source coding literature for better part of two decades now. Given iid sequences x, the expected length of the shortest lossy representation such that x can be reconstructed to within distortion D is given by the rate distortion function, r(D). We prove the optimality of the Codelet Parsing algorithm for lossy compression of memoryless bit sequences. It splits the input sequence naturally into phrases, representing each phrase by a codelet, a potentially distorted phrase of the same length. The codelets in the lossy representation of a length-n string x have length roughly (log n)/r(D), and like the lossless Lempel Ziv algorithm, Codelet Parsing constructs codebooks logarithmic in the sequence length.
منابع مشابه
A Lossy Data Compression Based on String Matching: Preliminary Analysis and Suboptimal Algorithms
A practical suboptimal algorithm (source coding) for lossy (non-faithful) data compression is discussed. This scheme is based on an approximate string matching, and it naturally extends lossless (faithful) Lempel-Ziv data compression scheme. The construction of the algorithm is based on a careful probabilistic analysis of an approximate string matching problem that is of its own interest. This ...
متن کاملAn Implementable Lossy Version of the Lempel - Ziv Algorithm { Part I : Optimality for Memoryless
{ A new lossy variant of the Fixed-Database Lempel-Ziv coding algorithm for encoding at a xed distortion level is proposed, and its asymptotic optimality and universality for memoryless sources (with respect to bounded single-letter distortion measures) is demonstrated: As the database size m increases to innnity, the expected compression ratio approaches the rate-distortion function. The compl...
متن کاملAn implementable lossy version of the Lempel-Ziv algorithm - Part I: Optimality for memoryless sources
A new lossy variant of the Fixed-Database Lempel–Ziv coding algorithm for encoding at a fixed distortion level is proposed, and its asymptotic optimality and universality for memoryless sources (with respect to bounded single-letter distortion measures) is demonstrated: As the database size m increases to infinity, the expected compression ratio approaches the rate-distortion function. The comp...
متن کاملComplexity-compression tradeoffs in lossy compression via efficient random codebooks and databases
The compression-complexity trade-off of lossy compression algorithms that are based on a random codebook or a random database is examined. Motivated, in part, by recent results of Gupta-VerdúWeissman (GVW) and their underlying connections with the pattern-matching scheme of Kontoyiannis’ lossy Lempel-Ziv algorithm, we introduce a non-universal version of the lossy Lempel-Ziv method (termed LLZ)...
متن کاملOn Efficient Entropy Approximation via Lempel-Ziv Compression
We observe a classical data compression algorithm due to Lempel and Ziv, well-known to achieve asymptotically optimal compression on a wide family of sources (stationary and ergodic), to perform reasonably well even on short inputs, provided the source is memoryless. More precisely, given a discrete memoryless source with large alphabet and entropy bounded away from zero, and a source sequence ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1210.4700 شماره
صفحات -
تاریخ انتشار 2012